Expression of emotions is a crucial part of daily human communication. Emotion recognition in conversations (ERC) is an emerging field of study, where the primary task is to identify the emotion behind each utterance in a conversation. Though a lot of work has been done on ERC in the past, these works only focus on ERC in the English language, thereby ignoring any other languages. In this paper, we present Multilingual MELD (M-MELD), where we extend the Multimodal EmotionLines Dataset (MELD) \cite{poria2018meld} to 4 other languages beyond English, namely Greek, Polish, French, and Spanish. Beyond just establishing strong baselines for all of these 4 languages, we also propose a novel architecture, DiscLSTM, that uses both sequential and conversational discourse context in a conversational dialogue for ERC. Our proposed approach is computationally efficient, can transfer across languages using just a cross-lingual encoder, and achieves better performance than most uni-modal text approaches in the literature on both MELD and M-MELD. We make our data and code publicly on GitHub.
translated by 谷歌翻译
情绪识别(ER)旨在将人的话语分类为不同的情感类别。基于本文和声学模式之间的早期融合和基于自我注意力的多模式相互作用,在本文中,我们提出了一种多模式多任务学习方法,用于从孤立的单个话语中进行ER。Iemocap基准测试的实验表明,我们提出的模型的表现要比我们对最新的改性的重新实现要好,并且比文献中所有其他单峰和多模式方法更好地实现了性能。此外,强大的基准和消融研究证明了我们提出的方法的有效性。我们在GitHub上公开提供所有代码。
translated by 谷歌翻译
Psychology research has long explored aspects of human personality such as extroversion, agreeableness and emotional stability. Categorizations like the `Big Five' personality traits are commonly used to assess and diagnose personality types. In this work, we explore the question of whether the perceived personality in language models is exhibited consistently in their language generation. For example, is a language model such as GPT2 likely to respond in a consistent way if asked to go out to a party? We also investigate whether such personality traits can be controlled. We show that when provided different types of contexts (such as personality descriptions, or answers to diagnostic questions about personality traits), language models such as BERT and GPT2 can consistently identify and reflect personality markers in those contexts. This behavior illustrates an ability to be manipulated in a highly predictable way, and frames them as tools for identifying personality traits and controlling personas in applications such as dialog systems. We also contribute a crowd-sourced data-set of personality descriptions of human subjects paired with their `Big Five' personality assessment data, and a data-set of personality descriptions collated from Reddit.
translated by 谷歌翻译
Many real-world applications of language models (LMs), such as code autocomplete and writing assistance, involve human-LM interaction, but the main LM benchmarks are non-interactive, where a system produces output without human intervention. To evaluate human-LM interaction, we develop a framework, Human-AI Language-based Interaction Evaluation (H-LINE), that expands non-interactive evaluation along three dimensions, capturing (i) the interactive process, not only the final output; (ii) the first-person subjective experience, not just a third-party assessment; and (iii) notions of preference beyond quality. We then design five tasks ranging from goal-oriented to open-ended to capture different forms of interaction. On four state-of-the-art LMs (three variants of OpenAI's GPT-3 and AI21's J1-Jumbo), we find that non-interactive performance does not always result in better human-LM interaction and that first-person and third-party metrics can diverge, suggesting the importance of examining the nuances of human-LM interaction.
translated by 谷歌翻译
Bike sharing systems often suffer from poor capacity management as a result of variable demand. These bike sharing systems would benefit from models to predict demand in order to moderate the number of bikes stored at each station. In this paper, we attempt to apply a graph neural network model to predict bike demand in the New York City, Citi Bike dataset.
translated by 谷歌翻译
A hallmark of human intelligence is the ability to learn new concepts purely from language. Several recent approaches have explored training machine learning models via natural language supervision. However, these approaches fall short in leveraging linguistic quantifiers (such as 'always' or 'rarely') and mimicking humans in compositionally learning complex tasks. Here, we present LaSQuE, a method that can learn zero-shot classifiers from language explanations by using three new strategies - (1) modeling the semantics of linguistic quantifiers in explanations (including exploiting ordinal strength relationships, such as 'always' > 'likely'), (2) aggregating information from multiple explanations using an attention-based mechanism, and (3) model training via curriculum learning. With these strategies, LaSQuE outperforms prior work, showing an absolute gain of up to 7% in generalizing to unseen real-world classification tasks.
translated by 谷歌翻译
Large Language Models (LLMs) have been the subject of active research, significantly advancing the field of Natural Language Processing (NLP). From BERT to BLOOM, LLMs have surpassed state-of-the-art results in various natural language tasks such as question answering, summarization, and text generation. Many ongoing efforts focus on understanding LLMs' capabilities, including their knowledge of the world, syntax, and semantics. However, extending the textual prowess of LLMs to symbolic reasoning has been slow and predominantly focused on tackling problems related to the mathematical field. In this paper, we explore the use of LLMs for automated planning - a branch of AI concerned with the realization of action sequences (plans) to achieve a goal, typically executed by intelligent agents, autonomous robots, and unmanned vehicles. We introduce Plansformer; an LLM fine-tuned on planning problems and capable of generating plans with favorable behavior in terms of correctness and length with reduced knowledge-engineering efforts. We also demonstrate the adaptability of Plansformer in solving different planning domains with varying complexities, owing to the transfer learning abilities of LLMs. For one configuration of Plansformer, we achieve ~97% valid plans, out of which ~95% are optimal for Towers of Hanoi - a puzzle-solving domain.
translated by 谷歌翻译
Chatbots, or bots for short, are multi-modal collaborative assistants that can help people complete useful tasks. Usually, when chatbots are referenced in connection with elections, they often draw negative reactions due to the fear of mis-information and hacking. Instead, in this paper, we explore how chatbots may be used to promote voter participation in vulnerable segments of society like senior citizens and first-time voters. In particular, we build a system that amplifies official information while personalizing it to users' unique needs transparently. We discuss its design, build prototypes with frequently asked questions (FAQ) election information for two US states that are low on an ease-of-voting scale, and report on its initial evaluation in a focus group. Our approach can be a win-win for voters, election agencies trying to fulfill their mandate and democracy at large.
translated by 谷歌翻译
This paper presents a new approach for analyzing and identifying potentially useful generalized plans. It presents a new conceptual framework along with an algorithmic process for assessing termination and reachability related properties of generalized plans. The presented framework builds upon classic results on the analysis of graphs to decompose generalized plans into smaller components in a novel algorithm for conducting a hierarchical analysis for termination of arbitrary generalized plans. Theoretical analysis of the new framework establishes soundness of the presented algorithms and shows how it goes beyond existing approaches; empirical analysis illustrates the scope of this approach. Our analysis shows that this new approach can effectively identify termination for a significantly larger class of generalized plans than was possible using existing methods.
translated by 谷歌翻译
The primary obstacle to developing technologies for low-resource languages is the lack of representative, usable data. In this paper, we report the deployment of technology-driven data collection methods for creating a corpus of more than 60,000 translations from Hindi to Gondi, a low-resource vulnerable language spoken by around 2.3 million tribal people in south and central India. During this process, we help expand information access in Gondi across 2 different dimensions (a) The creation of linguistic resources that can be used by the community, such as a dictionary, children's stories, Gondi translations from multiple sources and an Interactive Voice Response (IVR) based mass awareness platform; (b) Enabling its use in the digital domain by developing a Hindi-Gondi machine translation model, which is compressed by nearly 4 times to enable it's edge deployment on low-resource edge devices and in areas of little to no internet connectivity. We also present preliminary evaluations of utilizing the developed machine translation model to provide assistance to volunteers who are involved in collecting more data for the target language. Through these interventions, we not only created a refined and evaluated corpus of 26,240 Hindi-Gondi translations that was used for building the translation model but also engaged nearly 850 community members who can help take Gondi onto the internet.
translated by 谷歌翻译